Rates: PCA

Author

Tom Williams

Published

August 8, 2023

In this post, I’ll demonstrate how to use PCA to extract a set of yield curve factors from historic market data.

I’ll then provide an intuitive interpretation for the factors in terms of the level, slope, and curvature of the yield curve through time.

I’ll conclude an example of much the shape of the extracted factors can vary through time, and the implications of this w.r.t. next steps.

Setup

import numpy
import pandas
import jax
import jax.numpy

import xtuples as xt
import xfactors as xf
c:\python39\lib\site-packages\scipy\__init__.py:146: UserWarning:

A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.25.2

Data

We’ll start by loading some yield data for US and DE swap and bond curves.

Returns
dfs_curves = xf.bt.data.curves.curve_dfs(
    curves=xt.iTuple([
        "YCSW0023",
        "YCGT0025",
        "YCSW0045",
        "YCGT0016",
    ]),
    dp="../xfactors/__local__/csvs"
)
dfs_curves = {
    curve: xf.utils.dfs.apply_na_threshold(
        df, na_threshold=(0., 0.,), na_padding=(0.2, 0.4,)
    )
    for curve, df in dfs_curves.items()
}

Where, here, we filter to only those rows containing data for at least … of tenors, out of those tenors containing data for at least … of rows.

For an example using the full data-set, including missing entries, see here for example.

This gives us back, US bond yields back to 2010 (with the 2005-2010 period filterd for lack of data):

USD-G Yields: 2005-2023
xf.bt.data.curves.curve_chart(
    dfs_curves, 
    xf.utils.dates.y(2005), 
    xf.utils.dates.y(2023), 
    "USD-G"
)
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

And US swap yields for the full period 2005 to 2023:

USD-S Yields: 2005-2023
xf.bt.data.curves.curve_chart(
    dfs_curves, 
    xf.utils.dates.y(2005),
    xf.utils.dates.y(2023),
    "USD-S"
)
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

As you can see, we have significantly more tenors available in our swap curve.

PCA

PCA decomposes a given data matrix into a matrix of orthonormal factor loadings, and a matrix of independent unit-gaussian factor embeddings (see the linked posts for more detail).

There are roughly two ways to do this:

  • Eigen-decomposition of the input data covariance matrix.
  • Constrained minimisation of the re-constructed data matrix (following a round-trip encoding into factor space, and decoding back up into the original data space).

For now, we’ll go with the first of these, see here for background on the second, and here for an example using the same rates data as here.

Sector Weights Chart
import functools
@functools.lru_cache(maxsize=10)
def fit_pca(curve, d_start, d_end, n):
    df = dfs_curves[curve]
    df = xf.utils.dfs.index_date_filter(
        df, date_start=d_start, date_end=d_end
    )
    eigvals, eigvecs = xf.nodes.pca.vanilla.PCA.f(
        jax.numpy.transpose(df.values), n = n
    )
    eigvecs = xf.utils.funcs.set_signs_to(
        eigvecs.real, axis=1, i=0, signs=[1, -1, -1]
    )
    return df, eigvals.real, eigvecs

We can plot the resulting factor loadings as bar charts.

Sector Weights Chart
def pca_weights_chart(curve, d_start, d_end, n):
    df, eigvals, eigvecs = fit_pca(curve, d_start, d_end, n)
    weights = pandas.DataFrame(
        eigvecs.T,
        columns=df.columns,
        index=list(range(eigvals.shape[0]))
    )
    return xf.visuals.graphs.df_facet_bar_chart(
        xf.utils.dfs.melt_with_index(weights, index_as="factor"),
        x="variable",
        y="value",
        facet="factor",
        title="{}: {} - {}".format(curve, d_start, d_end)
    )

And the resulting embeddings as time series, in a similar form to the above.

Sector Weights Chart
def pca_path_chart(curve, d_start, d_end, n):
    df, eigvals, eigvecs = fit_pca(curve, d_start, d_end, n)
    factors = jax.numpy.matmul(df.values, eigvecs)
    factors_df = pandas.DataFrame(
        factors,
        columns=list(range(eigvals.shape[0])),
        index=df.index,
    )
    return xf.visuals.graphs.df_facet_line_chart(
        xf.utils.dfs.melt_with_index(factors_df, index_as="date"),
        x="date",
        y="value",
        facet="variable",
        title="{}: {} - {}".format(curve, d_start, d_end)
    )

Factors - Loadings

For instance, below are the first three factors extracted from a history of the USD government bond curve spanning from 2005 through to April 2023:

Factor Loadings: USD-G
pca_weights_chart("USD-G", xf.utils.dates.y(2005), xf.utils.dates.y(2023), 3)
No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)
c:\python39\lib\site-packages\jax\_src\numpy\lax_numpy.py:3662: UserWarning:

'kind' argument to argsort is ignored; only 'stable' sorts are supported.

As one can see, there is a clear structure to the three components.

The first ‘level’ component - capturing the majority of the variance of our yield curve - has loadings with all the same sign, downward sloping from a peak amongst the first few tenors, and is conventionally positive.

The second component slopes (roughly) monotonically from one sign to another (conventionally from negative to positive), with a single zero point, and is known as the ‘slope’ factor.

The third generally has two zero points, with the front- and back-end of the curve weighted counter to the belly, and is referred to as the ‘curvature’.

As one can see, we get a very similar result from the USD swap curve:

Factor Loadings - USD-S
pca_weights_chart("USD-S", xf.utils.dates.y(2005), xf.utils.dates.y(2023), 3)
c:\python39\lib\site-packages\jax\_src\numpy\lax_numpy.py:3662: UserWarning:

'kind' argument to argsort is ignored; only 'stable' sorts are supported.

Factors - Embeddings

Let’s then plot the path of the factor embeddings (here, for the US swaps curve):

Sector Weights Chart
pca_path_chart("USD-S", xf.utils.dates.y(2005), xf.utils.dates.y(2023), 3)
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

One can clearly see the 2005, 2018 and 2022 hiking cycles, and the 07-08 post-GFC rate cuts, in the level component.

The 2022-23 flattening is very clearly visible in the slope component (blunting the rise in the level component on the back end of the curve).

We can also very clearly see the steady march flatter from 2013 through to covid, as back-end swap rates first compressed in through to 2016, before the front-end then rose up to meet them into the 2018 Powell pivot.

One can arguably also see evidence of intermittent flattening bouts (roughly) aligning with the 08-10 and 10-11 and 12-13 rounds of quantitative easing (with QE designed to push down the back-end of the curve).

Bonds vs Swaps

As one can see, the factor paths are highly correlated between the two USD curves we have.

Here, the government curve history:

Sector Weights Chart
pca_path_chart("USD-G", xf.utils.dates.y(2005), xf.utils.dates.y(2023), 3)
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

And, here, the USD swaps curve decomposition for the same time range (as we have less complete data for the bond history - see here for an excercise in interpolating the rest using matrix completion):

Sector Weights Chart
pca_path_chart("USD-S", xf.utils.dates.y(2010), xf.utils.dates.y(2023), 3)
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

Where the key difference that I would highlight is the relative difference in flattening going into 2023, with the government curve remaining relatively steeper (ie. with longer dated bonds proportionately wider than the equivalent swaps).

Discussion

Let’s try running the same decomposition, but on a smaller subset of our yield history.

For instance, on our swap curve, but between only 2010 and 2015:

Sector Weights Chart
pca_weights_chart("USD-S", xf.utils.dates.y(2010), xf.utils.dates.y(2015), 3)

As one can see, the shape of level component has both almost entirely inverted, and has more or completely flattened at the front end.

Furthermore, our slope component is no longer monotonic, having developed a peak of it’s own around the 18 month point.

We get a similar result from the bond curve:

Sector Weights Chart
pca_weights_chart("USD-G", xf.utils.dates.y(2010), xf.utils.dates.y(2015), 3)

Here, the level component is even slightly negative at the front-end.

These loadings make some sense: during this period fed rates were firmly anchored at the zero lower bound, so there was relatively little market volatility in front-end rates.

You can see this if you scroll back up to the yield curve time series above (observe the first few tenors pinned close to zero).

As such, the first two components - each of which simply captures co-variation in yields - have very small loadings on the front end (as these tenors simply didn’t move very much during this period).

We also see something very similar in the EUR swaps factors for 2017 through 2019:

Sector Weights Chart
pca_weights_chart("EUR-S", xf.utils.dates.y(2017), xf.utils.dates.y(2019), 3)
c:\python39\lib\site-packages\jax\_src\numpy\lax_numpy.py:3662: UserWarning:

'kind' argument to argsort is ignored; only 'stable' sorts are supported.

Which, again, makes sense if you refer back to the relevant period in the yield curve history:

Sector Weights Chart
xf.bt.data.curves.curve_chart(dfs_curves, xf.utils.dates.y(2005), xf.utils.dates.y(2023), "EUR-S")
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

And, to some extent, in the German government bond curve from 2012 to 2016:

Sector Weights Chart
pca_weights_chart("EUR-DE", xf.utils.dates.y(2012), xf.utils.dates.y(2016), 3)
c:\python39\lib\site-packages\jax\_src\numpy\lax_numpy.py:3662: UserWarning:

'kind' argument to argsort is ignored; only 'stable' sorts are supported.

Which, again, makes sense given the relative volatility of the front and long end of the curve during this period:

Sector Weights Chart
xf.bt.data.curves.curve_chart(dfs_curves, xf.utils.dates.y(2005), xf.utils.dates.y(2023), "EUR-DE")
c:\python39\lib\site-packages\_plotly_utils\basevalidators.py:107: FutureWarning:

The behavior of DatetimeProperties.to_pydatetime is deprecated, in a future version this will return a Series containing python datetime objects instead of an ndarray. To retain the old behavior, call `np.array` on the result

These are, admitteldy, particularly striking examples of factor instability - coinciding as they did with periods of especially unusual central bank monetary policy.

That being said, it is nonetheless clear that, without further prior structure imposed on the shape of our components, the factors extracted by PCA can vary significantly through time (though, less than that observed from equivalent analysis in equity markets).

Conclusion

PCA is a useful first step in decomposing a yield curve into a smaller number of more manageable factors.

However, vanilla PCA - particularly in the eigen-decomposition form we used above - suffers from a couple of key limitations:

  • incompatibility with missing data (hence the truncated history of our USD bond decomposition).

  • factor loading instability, given the relatively un-opionated nature of the decomposition (diagonal variance maximisation).

We can naively deal with the first of these by calculating our covariance matrix pair-wise, in a way that’s robust to missing data, as in here.

However, this can lead to inaccurate covariance estimates for tenors under-represented in our dataset. This can be somewhat improved upon by moving to a covariance kernel defined over tenor space, however we’re still unable to calculate the embedding values for dates with missing tenor yields.

An alternative approach, which does allow for latent embedding predictions over dates with missing yields, is to use matrix completion, in which we fit our factor loadings using a constrained L2 minimisation rather than eigen-decomposition.

This, then, can be naturally combined with our covariance tenor-kernel, to allow us to make yield predictions for entirely unseen new tenors.

A final step, and one that naturally deals with both missing data and the second limitation (of factor loading instability), is to move away from discrete loading vectors entirely, and to instead use parametric factors function, on which we can impose a greater deal of prior knowledge (again, fit by a constrained l2 minimisation).